In this Take-home Exercise, I will explore the economic of the city of Engagement, Ohio USA.
Challenge 3: Economic considers the financial health of the city. Over time, are businesses growing or shrinking? How are people changing jobs? Are standards of living improving or declining over time?
Consider the financial status of Engagement’s businesses and residents, and use visual analytic techniques to address these questions.
In this take-home exercise, appropriate static and interactive statistical graphics methods are used to reveal the economic of the city of Engagement, Ohio USA while addressing the questions stated in the Task section.
The data are processed by using appropriate tidyverse family of packages and the statistical graphics are prepared using ggplot2 and its extensions.
The picture below shows a sketch of the initial design proposed.
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse', 'ggdist', 'ggridges', 'patchwork', 'ggthemes', 'lubridate', 'ggiraph', 'gganimate', 'plotly', 'DT', 'crosstalk')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
library(trelliscopejs)
The code chunk below imports TravelJournal.csv and
Employers.csv from the data folder into R by using read_csv()
of readr
package and save them as tibble data frames called travel and
employers.
travel <- read_csv("rawdata/TravelJournal.csv")
employers <- read_csv("rawdata/Employers.csv")
summary(travel)
participantId travelStartTime travelStartLocationId
Min. : 0.0 Min. :2022-03-01 05:00:00 Min. : 1
1st Qu.: 221.0 1st Qu.:2022-06-10 17:35:00 1st Qu.: 449
Median : 464.0 Median :2022-10-03 18:40:00 Median : 913
Mean : 480.5 Mean :2022-10-05 05:21:39 Mean :1016
3rd Qu.: 726.0 3rd Qu.:2023-01-28 06:20:00 3rd Qu.:1358
Max. :1010.0 Max. :2023-05-24 23:35:00 Max. :1805
NA's :1043
travelEndTime travelEndLocationId purpose
Min. :2022-03-01 05:35:00 Min. : 0 Length:2099656
1st Qu.:2022-06-10 18:10:00 1st Qu.: 449 Class :character
Median :2022-10-03 19:00:00 Median : 910 Mode :character
Mean :2022-10-05 05:46:07 Mean :1015
3rd Qu.:2023-01-28 06:45:00 3rd Qu.:1358
Max. :2023-05-24 23:55:00 Max. :1805
checkInTime checkOutTime
Min. :2022-03-01 05:35:00 Min. :2022-03-01 06:00:00
1st Qu.:2022-06-10 18:10:00 1st Qu.:2022-06-10 21:40:00
Median :2022-10-03 19:00:00 Median :2022-10-03 22:47:30
Mean :2022-10-05 05:46:07 Mean :2022-10-05 09:53:15
3rd Qu.:2023-01-28 06:45:00 3rd Qu.:2023-01-28 08:30:00
Max. :2023-05-24 23:55:00 Max. :2023-05-25 00:05:00
startingBalance endingBalance
Min. : -681.6 Min. : -640.7
1st Qu.: 5077.8 1st Qu.: 5086.4
Median : 12006.9 Median : 12019.5
Mean : 19573.7 Mean : 19590.8
3rd Qu.: 25972.4 3rd Qu.: 25992.5
Max. :240494.7 Max. :240838.8
summary(employers)
employerId location buildingId
Min. : 379 Length:253 Min. : 3.0
1st Qu.: 829 Class :character 1st Qu.: 261.0
Median :1279 Mode :character Median : 486.0
Mean :1089 Mean : 517.8
3rd Qu.:1734 3rd Qu.: 782.0
Max. :1797 Max. :1041.0
Some of the new time-related fields have been added to travel with the following code chunk:
Data frames travel and employers are saved in RDS format to avoid uploading large files to Git.
We will first examine the overall trend of participants’ travels for various purpose in travel.
ggplot(data=travel,
aes(x = travelStartTime,
fill = purpose)) +
geom_histogram(bins=15,
color="black") +
scale_y_continuous(NULL,
breaks = NULL) +
labs(y= 'Travel Count', x= 'Time',
title = "Travel Trend by Purpose Over Time") +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
axis.line= element_line(color= 'grey'))
We have transformed the data accordingly and displayed a trellis plot partitioned by purpose in order to look at travel trend for respective purpose.
qplot(year_month, count, data = travel_count) +
facet_wrap(~ purpose) +
labs(y= 'Travel Count', x= 'Time',
title = "Travel Trend for Respective Purpose Over Time") +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(), axis.text.x=element_blank(),
axis.line= element_line(color= 'grey'))
We will zoom into work travel patterns and both travel and
employers are joined using inner_join() function
of dplyr.
travel_to_work <- travel %>%
filter(purpose == 'Work/Home Commute') %>%
inner_join(y=employers, by = c("travelEndLocationId" = "employerId")) %>%
select(participantId, travelEndTime, year_month, day, wkday, travelEndLocationId, purpose, location, buildingId) %>%
rename('employerId' = 'travelEndLocationId')
Renaming has been performed simplicity using the code chunk below:
travel_to_work$purpose <- sub('Work/Home Commute',
'Work',
travel_to_work$purpose)
We will now look at daily trend of work travel patterns for each month.
ggplot() +
geom_line(data=travel_to_work_count,
aes(x=day,
y=count,
group=year_month),
colour="black") +
facet_grid(~year_month) +
labs(y= 'Travel \nCount', x= 'Month/Day',
title = "Daily Travel Trend across Month") +
theme(axis.title.y= element_text(angle=0),
axis.line= element_line(color= 'grey'))
Due to the cluttered-ness of the previous plot, we will aim to improve the display using a trellis dot plot.
qplot(day, count, data = travel_to_work_count) +
facet_wrap(~ year_month) +
labs(y= 'Travel \nCount', x= 'Month/Day',
title = "Daily Travel Trend across Month") +
theme(axis.title.y= element_text(angle=0), axis.text.x=element_blank(), axis.ticks.x=element_blank(),
axis.line= element_line(color= 'grey'))
Since the facet function of ggplot2 is not useful for visualizing large data, we will use trelliscopejs instead.
qplot(day, count, data = travel_to_work_count) +
facet_trelliscope(~ year_month, nrow = 2, ncol = 4, width = 600,
path = "trellis/",
self_contained=TRUE) +
labs(y= 'Travel \nCount', x= 'Day') +
theme(axis.title.y= element_text(angle=0), axis.text.x=element_blank(), axis.ticks.x=element_blank(),
axis.line= element_line(color= 'grey'))
We will again transform the data accordingly to look at monthly work
travel trend using geom_bar_interactive() of
ggiraph.
travel_to_work_by_month$tooltip <- c(paste0(
"Purpose = ", travel_to_work_by_month$purpose,
"\n Count = ", travel_to_work_by_month$count))
p <- ggplot(data=travel_to_work_by_month,
aes(x = year_month, y = count)) +
labs(y= 'Travel Count', x= 'Month',
title = "Monthly Travel Trend Over Time") +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
axis.line= element_line(color= 'grey')) +
geom_bar_interactive(aes(tooltip = travel_to_work_by_month$tooltip,
data_id = year_month),
stat="identity")
girafe(
ggobj = p,
width_svg = 12,
height_svg = 12*0.618
)
We will transform the data to compute for month-over-month turnover rate and join with the previous plot to form coordinated mutiple views.
travel_to_work_monthly_change <- travel_to_work_daily_count %>%
group_by(employerId, year_month) %>%
summarise(monthly_employees = max(count)) %>%
mutate(mom_change = coalesce(monthly_employees - lead(monthly_employees),0),
mom_turnover_rate = coalesce((monthly_employees - lead(monthly_employees))/monthly_employees,0)) %>%
ungroup()
travel_to_work_mom$tooltip <- c(paste0(
"MOM Turnover % =", round(travel_to_work_mom$avg_turnover*100,1), '%'))
p2 <- ggplot(data=travel_to_work_mom,
aes(x = year_month, y = avg_turnover)) +
labs(y= 'Turnover %', x= 'Month',
title = "Monthly Turnover Trend Over Time") +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
axis.line= element_line(color= 'grey')) +
geom_bar_interactive(aes(tooltip = travel_to_work_by_month$tooltip,
data_id = year_month),
stat="identity") +
geom_bar_interactive(aes(tooltip = travel_to_work_mom$tooltip,
data_id = year_month),
stat="identity")
girafe(code = print(p / p2),
width_svg = 12,
height_svg = 12,
options = list(
opts_hover(css = "fill: #202020;"),
opts_hover_inv(css = "opacity:0.2;")
)
)
We will now look at the calendar heatmap of travel patterns by wkday
across month using geom_tile() as well as
ggplotly.
p3 <- ggplot(travel_to_work_by_day,
aes(year_month,
wkday,
fill = daily_employees)) +
geom_tile(color = "white",
size = 0.1) +
theme_tufte(base_family = "Helvetica") +
coord_equal() +
scale_fill_gradient(name = "# of travels",
low = "sky blue",
high = "dark blue") +
labs(y= NULL, x= 'Month',
title = "Wkday-ly Travel Trend across Month") +
theme(axis.ticks = element_blank(),
axis.text.x = element_text(size = 7),
plot.title = element_text(hjust = 0.5),
legend.title = element_text(size = 8),
legend.text = element_text(size = 6) )
ggplotly(p3)
We will rename year_month values to 1-15 periods in order to plot an animated bubble plot.
p4 <- ggplot(travel_to_work_initial, aes(x = mom_turnover_rate, y = strength_level,
size = monthly_employees,
colour = employerId)) +
geom_point(alpha = 0.7,
show.legend = FALSE) +
scale_size(range = c(2, 12)) +
labs(title = 'Period cumulative by year_month: {frame_time}',
x = 'Turnover %',
y = 'Employee %') +
transition_time(year_month) +
ease_aes('linear')
animate(p4, nframes = 100, fps = 3)